{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "3bbc9a0e",
   "metadata": {
    "id": "3bbc9a0e"
   },
   "source": [
    "<center>\n",
    "    <p style=\"text-align:center\">\n",
    "        <img alt=\"phoenix logo\" src=\"https://storage.googleapis.com/arize-phoenix-assets/assets/phoenix-logo-light.svg\" width=\"200\"/>\n",
    "        <br>\n",
    "        <a href=\"https://arize.com/docs/phoenix/\">Docs</a>\n",
    "        |\n",
    "        <a href=\"https://github.com/Arize-ai/phoenix\">GitHub</a>\n",
    "        |\n",
    "        <a href=\"https://arize-ai.slack.com/join/shared_invite/zt-2w57bhem8-hq24MB6u7yE_ZF_ilOYSBw#/shared-invite/email\">Community</a>\n",
    "    </p>\n",
    "</center>\n",
    "<h1 align=\"center\">Multimodal LLM Ops - Tracing, Evaluation, and Analysis of Multimodal Models</h1>\n",
    "\n",
    "In this notebook, we show how to use a Mult-Modal LLM, i.e., OpenAI's `gpt-4o`, to ask questions about images (image reasoning) using the chat API. In addition, we use Arize's Phoenix and OpenInference AutoInstrumentor to trace the operation.\n",
    "\n",
    "- Framework: [LlamaIndex](https://github.com/run-llama/llama_index)\n",
    "- LLM: OpenAI's GPT-4o\n",
    "- LLM Observability: [Arize Phoenix](https://phoenix.arize.com/) ([GitHub](https://github.com/Arize-ai/phoenix))\n",
    "- LLM Tracing: Arize's [OpenInference](https://arize-ai.github.io/openinference/) [Auto-Instrumentor](https://github.com/Arize-ai/openinference/tree/main/python/instrumentation/openinference-instrumentation-llama-index)\n",
    "\n",
    "Steps:\n",
    "1. Install dependencies\n",
    "2. Setup Tracing\n",
    "2. Download Images from Tesla\n",
    "3. Setup the Multi-Modal LLM application\n",
    "4. Use the Multi-Modal LLM application\n",
    "\n",
    "⚠️ This tutorial requires an OpenAI key to run"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9dd3b863-2bc4-4efd-a40e-d637b54a758b",
   "metadata": {},
   "source": [
    "## Install dependencies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "647fc5c5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Observability & Tracing dependencies\n",
    "%pip install -qq \"arize-phoenix>=4.30.2\" \"openinference-instrumentation-llama-index==2.2.4\"\n",
    "# Framework dependencies\n",
    "%pip install -qq \"llama-index==0.10.68\"\n",
    "# Other dependencies: so that we can show and understand the images in this notebook\n",
    "%pip install -qq matplotlib"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d60540e6-61e7-4651-81e8-ed646d5d08f7",
   "metadata": {},
   "source": [
    "## Setup Tracing"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ce68d6d1-d15d-4414-851c-63d353ad382c",
   "metadata": {},
   "source": [
    "First, we launch the phoenix app, which will act as an OTEL collector of the generated spans. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6a8c56cb-e773-4ac6-92ad-e169da14ae7e",
   "metadata": {},
   "outputs": [],
   "source": [
    "import phoenix as px\n",
    "\n",
    "px.launch_app()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f21c155e-8aba-474b-a98f-414a6e5b466d",
   "metadata": {},
   "source": [
    "Next, we setup the tracing by declaring a tracer provider and a span processor with an OTLP span exporter:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "142bb4be-06e5-43db-ba8b-f8a3e1e7685e",
   "metadata": {},
   "outputs": [],
   "source": [
    "from phoenix.otel import register\n",
    "\n",
    "tracer_provider = register(endpoint=\"http://127.0.0.1:6006/v1/traces\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c16354fb-eaac-4fe6-af05-2f2c4938994c",
   "metadata": {},
   "source": [
    "Finally, we use OpenInference's Llama-Index auto-instrumentor."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d2b0dee7-10a3-47f7-855a-1b3276c63380",
   "metadata": {},
   "outputs": [],
   "source": [
    "from openinference.instrumentation import TraceConfig\n",
    "from openinference.instrumentation.llama_index import LlamaIndexInstrumentor\n",
    "\n",
    "config = TraceConfig(base64_image_max_length=100_000_000)\n",
    "LlamaIndexInstrumentor().instrument(\n",
    "    tracer_provider=tracer_provider,\n",
    "    config=config,\n",
    "    skip_dep_check=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ad0c4d3-bb0d-4ae2-a08d-bee8b72b59ef",
   "metadata": {},
   "source": [
    "That's it! With these 2 cells you have correctly set up the tracing of your Llama-Index application. As you use this application, spans will be exported to Phoenix for observability and analysis."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7683e4ed",
   "metadata": {
    "id": "7683e4ed"
   },
   "source": [
    "## Download images from Tesla's website"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b383f38e",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "input_image_path = Path(\"input_images\")\n",
    "if not input_image_path.exists():\n",
    "    Path.mkdir(input_image_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "051d6825",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%capture\n",
    "!wget \"https://docs.google.com/uc?export=download&id=1nUhsBRiSWxcVQv8t8Cvvro8HJZ88LCzj\" -O ./input_images/long_range_spec.png\n",
    "!wget \"https://docs.google.com/uc?export=download&id=19pLwx0nVqsop7lo0ubUSYTzQfMtKJJtJ\" -O ./input_images/model_y.png\n",
    "!wget \"https://docs.google.com/uc?export=download&id=1utu3iD9XEgR5Sb7PrbtMf1qw8T1WdNmF\" -O ./input_images/performance_spec.png\n",
    "!wget \"https://docs.google.com/uc?export=download&id=1dpUakWMqaXR4Jjn1kHuZfB0pAXvjn2-i\" -O ./input_images/price.png\n",
    "!wget \"https://docs.google.com/uc?export=download&id=1qNeT201QAesnAP5va1ty0Ky5Q_jKkguV\" -O ./input_images/real_wheel_spec.png"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0ffb1327",
   "metadata": {
    "id": "0ffb1327"
   },
   "source": [
    "Next, we simply plot the images so you know what we just downloaded"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e5a64bb6",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "from PIL import Image\n",
    "\n",
    "image_paths = []\n",
    "for img_path in os.listdir(\"./input_images\"):\n",
    "    image_paths.append(str(os.path.join(\"./input_images\", img_path)))\n",
    "\n",
    "\n",
    "def plot_images(image_paths):\n",
    "    images_shown = 0\n",
    "    plt.figure(figsize=(16, 9))\n",
    "    for img_path in image_paths:\n",
    "        if os.path.isfile(img_path):\n",
    "            image = Image.open(img_path)\n",
    "\n",
    "            plt.subplot(2, 3, images_shown + 1)\n",
    "            plt.imshow(image)\n",
    "            plt.xticks([])\n",
    "            plt.yticks([])\n",
    "\n",
    "            images_shown += 1\n",
    "            if images_shown >= 9:\n",
    "                break\n",
    "\n",
    "\n",
    "plot_images(image_paths)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e54694f",
   "metadata": {
    "id": "4e54694f"
   },
   "source": [
    "## Setup the Multi-Modal LLM application"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e3f82122-4dba-41ed-b595-b8abdf753ffa",
   "metadata": {},
   "source": [
    "First things first, we need an OpenAI (our LLM provider) API key."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b8959291-480e-4f0a-be06-c5491c26a495",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from getpass import getpass\n",
    "\n",
    "if not (openai_api_key := os.getenv(\"OPENAI_API_KEY\")):\n",
    "    openai_api_key = getpass(\"🔑 Enter your OpenAI API key: \")\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = openai_api_key"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8032b2c-337a-4dc1-9d58-0fc1b6673a02",
   "metadata": {},
   "source": [
    "Last, we need to declare our OpenAI from the `multi_nmodal_llms` module and use a `SimpleDirectoryReader` to have access to the downloaded images"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "836b5a48-af5f-41bb-ba4a-be529ceaccbd",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import SimpleDirectoryReader\n",
    "from llama_index.multi_modal_llms.openai import OpenAIMultiModal\n",
    "\n",
    "# put your local directory here\n",
    "image_documents = SimpleDirectoryReader(\"./input_images\").load_data()\n",
    "\n",
    "openai_mm_llm = OpenAIMultiModal(\n",
    "    model=\"gpt-4o\",\n",
    "    max_new_tokens=1500,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9d90508c-e6cf-482e-a4b0-33cc9117b1ba",
   "metadata": {},
   "source": [
    "## Use the Multi-Modal LLM application"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fbb94d45-4b2e-41e3-a1ab-46274040365b",
   "metadata": {},
   "source": [
    "We set up multimodal chat messages and call the LLM"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4809a8b7-2d13-46c9-b14b-f1d817d1a12b",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.multi_modal_llms.openai.utils import (\n",
    "    generate_openai_multi_modal_chat_message,\n",
    ")\n",
    "\n",
    "# Setup first message: a question about the passed image documents\n",
    "message_1 = generate_openai_multi_modal_chat_message(\n",
    "    prompt=\"Describe the images as an alternative text\",\n",
    "    role=\"user\",\n",
    "    image_documents=image_documents,\n",
    ")\n",
    "\n",
    "# Call the LLM for a response to the question\n",
    "response_1 = openai_mm_llm.chat(\n",
    "    messages=[message_1],\n",
    ")\n",
    "\n",
    "print(response_1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "997e06e8-9e9a-42b8-876b-8be97c3fb162",
   "metadata": {},
   "source": [
    "We can also simulate a conversation by passing the response as a message from the \"assistant\", and ask further questions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d0f25fb1",
   "metadata": {},
   "outputs": [],
   "source": [
    "message_2 = generate_openai_multi_modal_chat_message(\n",
    "    prompt=response_1.message.content,\n",
    "    role=\"assistant\",\n",
    ")\n",
    "\n",
    "message_3 = generate_openai_multi_modal_chat_message(\n",
    "    prompt=\"Can you tell me what the price of each spec as well?\",\n",
    "    role=\"user\",\n",
    "    image_documents=image_documents,\n",
    ")\n",
    "response_2 = openai_mm_llm.chat(\n",
    "    messages=[\n",
    "        message_1,\n",
    "        message_2,\n",
    "        message_3,\n",
    "    ],\n",
    ")\n",
    "\n",
    "print(response_2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2292cb35-c437-4935-96a1-bfc137654771",
   "metadata": {},
   "source": [
    "Let's try make the last question more difficult. We can ask the last question without directly passing the image documents. Sometimes the LLM will rembemer the images passed in the first message and responde correctly. However, some other times it will be unaware of them and give an incomplete or wrong answer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8a30b180-26a8-4570-94fa-85dfe8e2f4fc",
   "metadata": {},
   "outputs": [],
   "source": [
    "message_3_no_images = generate_openai_multi_modal_chat_message(\n",
    "    prompt=\"Can you tell me what the price of each spec as well?\",\n",
    "    role=\"user\",\n",
    ")\n",
    "response_3 = openai_mm_llm.chat(\n",
    "    messages=[\n",
    "        message_1,\n",
    "        message_2,\n",
    "        message_3_no_images,\n",
    "    ],\n",
    ")\n",
    "\n",
    "print(response_3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e61a359f-889b-4d4c-9522-1d263d175f18",
   "metadata": {},
   "source": [
    "## Observability"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "715fe9f6-b0b6-40f6-8a07-14202bbb4acc",
   "metadata": {},
   "source": [
    "Now that we've run the application a couple of times, let's take a look at the traces in the UI:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3f2b3eae-62b2-4ee7-8882-6868f5537957",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"The Phoenix UI:\", px.active_session().url)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c8333fe6-0121-4633-ba75-2547cd3d7e20",
   "metadata": {},
   "source": [
    "The UI will give you an interactive troubleshooting experience. You can sort, filter, and search for traces. You can also view the questions asked and the images in the message. For instance you can see how in the second trace, the images were passed to every user message, but in the third trace only the first message had the images attached."
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
